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The opinions expressed in this paper do not reflect tlie 
position or opinion of ^ny organization or person other 
than of the authors thenselVes and no official endorsement 
should be inferred. 



This paper is a detailed hi<5tory of the implementation and evalua-- 
tion of a federally funded compensatory pro^^ram in a local school dis- 
trict. As is the case with most federal projects, the program design 
was revised on almost a day to day basis* The consequences of these 
revisions and associated events on the correspondins; evaluation of the 
project are described in this paper. These events are discussed and 
several recommendations concerning future public school evaluation ef- 
forts are made. 

The Emergency School Assistance Act (ESAA) was approved by Congress 
in 1972 to provide assistance to school districts involved in the deseg-- 
refation process* Some of these ESAA funds were earmarked by Congress 
to finance pilot projects that would implement promising educational 
innovations. These funds werei to provide needed compensatory educa- 
tional aid and to finance the evaluation of these innovations in tlie 
hope that successful ideas would be replicated on larger scales. 

V/hen the district was notified of the availability of ESAA monies, a 

district evaluation unit was being established. Only coincidentally the 

person eyentually responsible for heading up this evaluation unit also 

had a major role in designing the ESAA pilot project pronosal. A local 

question (which also seemed to apply nationally) concerned the use of 

classroom aides in compensatory education programs and the concurrent 

effects on student behavior. Therefore, on the basis of local needs 

and federal guidelines, a pilot program was designed to test the fol-- 

loving hypothesis: 

Students in classes with trained reading instructional aides will 
learn to read better than students in classes with untrained 
general aides and also better than students in classes with no 
aides at all. 



The evaluator who was designing the pxo9,rim had recently read an 
article by Mssrs. Campbell and Erlenbacher^ in which an ele.,antly stated 
case was made for "random assignment of children to treatments where 
this is possible" in compensatory educational proj^.rans. Persuaded by 
their rational arguments, the evaluator/deslgner decided to apply their 
suggestions to the program she was desir,ninB. Elementary schools des- 
ignated as ESEA Title I constituted the target area for the pro.iect. 
Teacher volunteers for the pro.iect were to be solicited from the target 
area schools such that volunteer teachers could be randomly assigned to 
each of three groups: one group of teachers would receive the services 
of a trained teading instructional classroom aide; another group would 
receive the services of an untrained general aide; and the third group 
of teachers would receive no aide services at all. The resulting design 
was a pre-posttest randomized control group design described by C.nhell 
and Stanley.2 ^his design had the Imnortant advantage of eliminating 
school effects: - 

R 0^ O2 
R 0^ X2 0^ 
R O5 0^ 

where X, = instructional reading aides 
X2 = untrcined general aides 

FIGURE 1 
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^Campbell, D. T.» & Erlebacher, A, How Regression Artifacts in Quasi- 
Experimental Evaluations Can Mistakenly Make Compensatory Education 
Look Harmfull. In J. Hellmuth (Ed.), Pi sadvan tailed Child - Compensatory 
Education: _A National Debate (Vol. 3). New York: Brunner/Mazel 
Publishers, 1970, 

^Campbell, D. T., & Stanley » J, C. Kxperlmental and Quasi-experimental 
Des igns for Res earch . Chicago: Rand McNally & Company, 1970* 
(Reprinted from Handbook of Rese a rch on Tea ch.in^> 1963,) 



This design never made it off the dravMns board. One oblection to 
it was that the differentiation of aide services amon«> teachers in the 
same building would not be tolerated by those teachers who were not also 

receivin;> the services of trained instructional readin?^ aides. Another 

\ 

objection was that confining the program to only elementary schools 
would not pronote instructional continuity from elementary to secondary 
school levels. 

The design had to be altered in response to both of these ob1ec- 
tions. It was decided to abandon the randomized assignment of classes 
to treatment groups, and to instead assign complete schools to treatment 
groups. It was also decided to include a junior high school in the 
treatment group. Because of the limited number of aides provided by 
ESAA funds, the result of these design alterations was to reduce the 
number of schot>ls in the treatment group from nineteen elementary schools 
to two elementary schools and one' junior high school. This design a la 
Campbell and Stanley is represented below: 
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where X « instructional reading aides 
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untrained peneral aides 
FIGURE 2 
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The two lowest-achieving elementary schools and Che lowest-achiev- 
ing junior high school in the district which needed conpensatory services 
the most were selected as the experimental schools to receive the ser- 
vices of trained classroom reading instructional aides. Two Title I 
eienentary schools which already had general classroom aides through the 
auspices of another federal compensatory program were selected as the 
eienentary general aide comparison schools. The second lowest-achieving 
junior high school In the district was selected to receive the services 
of untrained general aides and to serve as the general aide secondary 
comparison school. The schools selected as the no-aide comparison group 
were two eleuientary schools which, unfortunately for the design, ranked 
:.n the top fourth, academically, of Title I schools in the district* 
The third lowest-achievinp, junior high school in the district was se- 
lected 3S the no aide secondary comparison school, (Standarxzed achieve- 
ment test scores of students at this third junior high school were, how- 
ever, approxinately one full year higher than either the experimental or 
the general aide comparison junior high schools.) 

The limitations of the design at this point seemed insurmountable: 
at the elementary level there were only two units of analysis and at 
the junior high level only one unit. In addition, the no aide compari- 
son group was initially superior to both the experimental group and the 
general aide group* It appeared obvious to the evaluators even before 
the project had been implemented that there was little hope of answering 
the research question. Yet the evaluators hoped that some information 
could be gleaned during the next two years which might offer some clues 
to the most effective use of classroom aides. Little did they know what 
was in store. 
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Several unanticipated events occurred durinj> the nonths just prior 
to the initial project implementation which affected the program design 
drastically. At the same time the ES^A Pilot funds for this project 
were awarded to the district, ESA.\ Bilingual/Bicultural funds were also 
awarded and were placed in the sane thiee experimental schools as the 
pilot project. About one month prior to the openinp, of school, a court 
desegregation order required that sixth graders be moved ffom elementary 
buildings, and be bused to newly created sixth grade schools all over 
town. Two of these sixth grade schools were housed with the junior high 
perimental and general aide comparison schools. The effect this had 
the project and comparison schools was to remove sixth graders from 
the elementary buildings and to incorporate them into the junior high 
school buildings, thereby altering the organizational and social struc- 
tures at both levels. Then, approximately two weeks before school 
started, the two elementary project school principals were reassigned 
and two new principals, both young men in their first administrative 
assignment, were appointed. 

About two weeks after school started it became apparent that some- 
thing unusual was going on in the elementary general aide comDarison 
schools. Upon closer inspection, a special reading program sponsored 
by a local university was discovered by the evaluation staff to be 
operating in those comparison schools. This university project uti- 
lized approximately 80 part-time undergraduate tutors. We began to 
think that our experimental project schools were going to serve a& a 
control group for the general aide comparison group. The design was 
getting more complicated: 
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where X^^ = instructional reading aides 
= untrained general aides 
X3 = new bilingual program 

X4 = sixth graders removed from elementary buildings 

X5 *= sixth graders introduced into junior high buildings 

X5 - new school for sixth graders only 

Xy « new first year principals 

Xy ■= special university reading project 

FIGURE 3 



As if matters were not already bad enough, problems were discovered 
with the project testing schedule. Because of understandable resis- 
tance from schools to over-testing of students, the project test mea- 
sures were administered at pre-treatment and post-troatment tines; some 
'*f^rc" measures were given a half year before the start of the project, 
and\the corresponding '*post'' measures were given halfway through the 
project year, Tliis situation is represented in Figure 4 on the next page 
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During the second project year other special programs were intro- 
duced into the projcct'iand comparison groups, leaving the •'design" 

j 

looking something like' Figure 5 below: 
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vti«re - Instrwcfclonal reading aides 
X2 - unttaioeJ f.encral aides 
X, - new bilingual program 

- sixth graders removed from elementary buildings 

Xs " sixth sr«ders introduced Into junior high buildings 

- new school Cor sixth graders only 
Xt - new first year principals 

Xo • special wilverslty reading program 

X9 - another billnctial program 

X,Q- Teacher Corps training prograa 

x||- behavior modification training program 

X12- aocial mdics curriculum pilot project 

X|3- cev rea41mg curric«l« 

Xi4« aocial workers 

X^5» imivermity tutora 
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DISCUSSION 

It is not the point of this paper to belittle our own efforts nor 
the efforts of other evaluators* Nor do we disagree in theory with 
any criticisms of quasi-experimental and ex post facto evaluation de- 
signs. However, warninp^s against regression artifacts and matched 
saaples seem somewhat irrelevant when applied to the real-life problems 
described previously in this paper* We feel that more fundamental warnings 
are needed for today's evaluators and educational decision-makers. 

IThen programs are selected for evaluation (almost always after the 
program design is completed) they are assigned to either an internal or 
an external agency for evaluation. Usually, however, the program has 
been designed so poorly (for evaluation purposes) that very little can 
be discovered concerning its worth* These desl<;n Inadequacies most 
often result from very real political pressures brought to bear upon 
decision makers: they are urged to blanket a x^rhole pop^ilation with 
the latest curriculum (reserving no valid subjects for control purposes) 
or to introduce all at once literally dozens of resources into a few 
schools (thereby concealing the relationship betvreen individual treat-- 
mencs and outcomes). In both cases, finding answers to crucial eval- 
uation questions is almost guaranteed to be impossible. In addition to 
being a frustrating situation for evaluators and their bosses, en eval- 
uation of such programs under these conditions does not yield a maximum 
return on taxpayers' money* 

Evaluation is still an infant in the education family* Persons 
not directly involved in it have developed little appreciation for those 
events which can Invalidate evaluation conclusions* ' We would like to 
- emphasize here a few of the basic rules of program design which must he 
O adhered to If needed answers arc to be provided through evaluation: 
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ftAgTr_Rm.F.S FOR PROGRAM PF.SIGNERS 
When the merits of an educational program are to be assessed, the 
treatment group must be compared to some control Rroup in which the 
treatment Is not present. Otherwise, any gains or losses observed 
among the treatment group cannoJLbe pnequivocally attributed to the 
treatment which is beinj; evaluated. ' 

The treatment and comparison groups nust be composed of the same 
kind of people. Random assignment of subjects to each ^roup Is the 
most reliable way to attain identity of r,roups , although matching ca 
be used if a large enough subject pool is available. If matching 
is used, it must be done on a large number of variables and not on 
just a few. 

There must be a large enough number of units in each group to allow 
for the plausibility of significant differences occurring between 
them, and to allow for any degree of generalizability of the results 
We might point out here that if 500 students in two schools are 
assigned to a treatment group and 500 students in two other schools 
are assigned to a control group, the number of statistical units in 
each group is two, not 500. Usually the differences among pnall 
groups of schools due to non-treatment sources like socioeconomic 
status, staff competencies, etc., are so many that any differences 
between the schools due to the treatment are obscured. If a large 
number of schools is not available for assignment to the treatment 
and control groups, then the treatment should be randomly assigned 
on either a classroom or an individual student basis, as appropri- 



ate. 



All subjects in both the treatment and control groups must be pre- 
tested at the same time and post-tested at the same time. (A test 
administration which covers a one-month period or longer docs not 
qualify as being "at the same time.") 
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5, Treatments should not be compounded in either the experimental or 
the control groups. If one curriculum is beinR compared with 
another curriculum, other large-scale proRrans should not also be 
; distributed among the treatment and control r,roups» 

These rules of program desi certainly very basic ones which 

are so familiar as to probably insult most of our readers. However > we 
believe that there are few educational progran designs being implemented 
and evaluated today which do not violate several of these obvious rules. 
We think the main source of problems with program designs is that 
funding agencies, local education agencies, and educators in general do 
not understand the r c^quirenents for determining whether or not a pro- 
gram is successful. Nor do they understand the implications of not 
meeting these requirements. In order to reduce this lack of under- 
standing, we would make the following recommendations: 

RCCQM>n:NDATIO:.^S TOUA-^.D i:i PROVI N C PR OGrw\y, DESKIljlS^ 

1. Evaluators in local education agencies must initiate or step up 
their inservice efforts with decision-makers concerning design re- 
quirements of programs which are to be evalnaced. This training 
should involve school board members, district-wide administrators 
and principals at the very least. 

2. All preservice teacher training programs should include a reqxiired 
introductory course in educational research and evaluation design 
and methodology. This would yield classroom dividends over and 
above benefits to those educational programs in which teachers 
would eventually participate. 

ERIC 10 13 



\ 

\ 



■3>^^A11 administrator certification programs should require both in- 
troductory and advanced courses (at least six hours total) in ed- 
ucational research and evaluation design and methodology, 

4. The eoucational research community should promote and sponsor con- 
ferences for educators who are not directly involved in research 
or evaluation, but who are responsible for program planninj^ and 
design. The purpose of these conferences would be to comnunicate 
research and evaluation requircments^in program design. For exam- ^ 
pie, the American Educational Research Association could develop 
and sponsor these conferences in conjunction with the American 
School Board Association and the American Association of Public 
School Adninistrators • 

5, Public school evaluation units should have approval authority on 
the design of those pro^^rams which are to be evaluated. If evnl- 
uators do not have this authority, their predlcaiTient can becone a 
question of ethics as described in the following situation: 

A public school evaluation unit is directed to find out if 
Curriculum X has a beneficial effect on student achievement, !jut the 
evaluation staff realizes that the program design set up by the 
instructional department and approved by the school board will pro- 

I 

hibit this question from being clearly answered. There are no control 
groups established for comparison with the treatment group, or if 
control groups have been established, the two groups are nowhere near 
identical; other independent variables (Curriculums A, B, C, D, I', F, 
G and H) are so liberally distributed throughout both the Curriculum X 
group and the ^'control** group as to make any achievement gains in 
either group totally unlnterpretable with respect to Curriculum X, 
Should the evaluation unit go ahead and ^'evaluate'' the program as it 
Q is designed, or snould they refuse to do so on ethical grounds? 

ERJ.C 14 



The above example clearly illustrates that what we must do is educate our 
colleagues (wi.o do have educational planning and budgeting responsibilities) 
concerning the committments they must make if they really want useful evalua- 
tion information. We need to be hardnosed and persistent in these efforts, 
and pt haps even decline to evaluate an inpossibly designed program. 

As evaluators we are also accountable, and the measure of our effec- 
tiveness is the improvement observed in student learning. This imnrovemert 
will not occur if the information we provide decision-nakers is invalid or 
if it is not used as input into the decision-making process. Our frank 
conclusion is that until our program designs improve, and until our colleagues 
are taught how to use the evaluation data we provide, rational decision-making 
and accountability will not occur. 



ERIC 



15 

12 



